113 research outputs found

    Simple Regret Optimization in Online Planning for Markov Decision Processes

    Full text link
    We consider online planning in Markov decision processes (MDPs). In online planning, the agent focuses on its current state only, deliberates about the set of possible policies from that state onwards and, when interrupted, uses the outcome of that exploratory deliberation to choose what action to perform next. The performance of algorithms for online planning is assessed in terms of simple regret, which is the agent's expected performance loss when the chosen action, rather than an optimal one, is followed. To date, state-of-the-art algorithms for online planning in general MDPs are either best effort, or guarantee only polynomial-rate reduction of simple regret over time. Here we introduce a new Monte-Carlo tree search algorithm, BRUE, that guarantees exponential-rate reduction of simple regret and error probability. This algorithm is based on a simple yet non-standard state-space sampling scheme, MCTS2e, in which different parts of each sample are dedicated to different exploratory objectives. Our empirical evaluation shows that BRUE not only provides superior performance guarantees, but is also very effective in practice and favorably compares to state-of-the-art. We then extend BRUE with a variant of "learning by forgetting." The resulting set of algorithms, BRUE(alpha), generalizes BRUE, improves the exponential factor in the upper bound on its reduction rate, and exhibits even more attractive empirical performance

    Landmarks, Critical Paths and Abstractions: What\u27s the Difference Anyway?

    Get PDF
    Current heuristic estimators for classical domain-independent planning are usually based on one of four ideas: delete relaxation, abstraction, critical paths, and, most recently, landmarks. Previously, these different ideas for deriving heuristic functions were largely unconnected. In my talk, I will show that these heuristics are in fact very closely related. Moreover, I will introduce a new admissible heuristic called the landmark cut heuristic which exploits this relationship. In our experiments, the landmark cut heuristic provides better estimates than other current admissible planning heuristics, especially on large problem instances

    Graphically structured value-function compilation

    Get PDF
    AbstractClassical work on eliciting and representing preferences over multi-attribute alternatives has attempted to recognize conditions under which value functions take on particularly simple and compact form, making their elicitation much easier. In this paper we consider preferences over discrete domains, and show that for a certain class of simple and intuitive qualitative preference statements, one can always generate compact value functions consistent with these statements. These value functions maintain the independence structure implicit in the original statements. For discrete domains, these representation theorems are much more general than previous results. However, we also show that it is not always possible to maintain this compact structure if we add explicit ordering constraints among the available outcomes

    Sensor networks and distributed CSP: communication, computation and complexity

    Get PDF
    We introduce SensorDCSP, a naturally distributed benchmark based on a real-world application that arises in the context of networked distributed systems. In order to study the performance of Distributed CSP (DisCSP) algorithms in a truly distributed setting, we use a discrete-event network simulator, which allows us to model the impact of different network traffic conditions on the performance of the algorithms. We consider two complete DisCSP algorithms: asynchronous backtracking (ABT) and asynchronous weak commitment search (AWC), and perform performance comparison for these algorithms on both satisfiable and unsatisfiable instances of SensorDCSP. We found that random delays (due to network traffic or in some cases actively introduced by the agents) combined with a dynamic decentralized restart strategy can improve the performance of DisCSP algorithms. In addition, we introduce GSensorDCSP, a plain-embedded version of SensorDCSP that is closely related to various real-life dynamic tracking systems. We perform both analytical and empirical study of this benchmark domain. In particular, this benchmark allows us to study the attractiveness of solution repairing for solving a sequence of DisCSPs that represent the dynamic tracking of a set of moving objects.This work was supported in part by AFOSR (F49620-01-1-0076, Intelligent Information Systems Institute and MURI F49620-01-1-0361), CICYT (TIC2001-1577-C03-03 and TIC2003-00950), DARPA (F30602-00-2- 0530), an NSF CAREER award (IIS-9734128), and an Alfred P. Sloan Research Fellowship. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of the US Government
    • …
    corecore